Skip to content

Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention)#1563

Closed
joshkmartinez wants to merge 5 commits intoopenai:mainfrom
joshkmartinez:submission-gdn-hynrid
Closed

Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention)#1563
joshkmartinez wants to merge 5 commits intoopenai:mainfrom
joshkmartinez:submission-gdn-hynrid

Conversation

@joshkmartinez
Copy link
Copy Markdown

Summary

3-Seed Results

Seed Steps EMA BPB val_bpb XSA BPB Artifact bytes
42 1864 1.017723 1.026791 1.031731 15,313,984
1337 2239 1.007375 1.016586 1.020691 15,830,308
2024 2241 1.008736 1.017995 1.023138 15,820,201
Mean 1.011278 1.02045733 1.025187 15,654,831.00
Std (sample) 0.00553017

Architecture

This submission uses an SP1024-tokenized GDN-Hybrid backbone with the following high-level structure:

[GDN×5] → SWA → [GDN×5] → SWA_shared

Key components:

  1. SP1024 tokenizer
  2. Gated DeltaNet hybrid backbone
  3. Sliding-window attention side path
  4. MuonEq-R + AdamW training mix
  5. EMA = 0.997
  6. Late QAT threshold = 0.15
  7. GPTQ int6 + zstd-22 packaging

Credits

@joshkmartinez
Copy link
Copy Markdown
Author

Superseded by #1564, which carries the stronger cold-cache 3-seed run039-safe019 artifact at 1.01710033 BPB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant